Control of Allergic Rhinitis and Asthma Test: A systematic review of measurement properties and COSMIN analysis

Abstract The Control of Allergic Rhinitis and Asthma Test (CARAT) is a patient‐reported outcome measurement (PROM) assessing the control of asthma and allergic rhinitis (AR) at a 4 week interval. This systematic review aimed to evaluate the measurement properties of CARAT. Following PRISMA and COSMIN guidelines, we searched five bibliographic databases and retrieved studies concerning the development, assessment of properties, validation, and/or cultural adaption of CARAT. The studies' methodological quality, the quality of measurement properties, and the overall quality of evidence were assessed. We performed meta‐analysis of CARAT measurement properties. We included 16 studies. Control of Allergic Rhinitis and Asthma Test displayed sufficient content validity and very good consistency (meta‐analytical Cronbach alpha = 0.83; 95% CI = 0.80–0.86;I 2 = 62.6%). Control of allergic rhinitis and Asthma Test meta‐analytical intraclass correlation coefficient was 0.91 (95% CI = 0.64–0.98;I 2 = 93.7%). It presented good construct validity, especially for correlations with Patient‐reported outcome measures assessing asthma (absolute Spearman correlation coefficients range = 0.67–0.73; moderate quality of evidence), and good responsiveness. Its minimal important difference is 3.5. Overall, CARAT has good internal consistency, reliability, construct validity and responsiveness, despite the heterogeneous quality of evidence. Control of Allergic Rhinitis and Asthma Test can be used to assess the control of asthma and AR. As first of its kind, this meta‐analysis of CARAT measurement properties sets a stronger level of evidence for asthma and/or AR control questionnaires.


| INTRODUCTION
Patient-reported outcome measures (PROMs) have been developed to quantify the perceived impact of a specific disease or group of diseases from the patient's perspective. 1,2 PROMs may provide valuable insights into several disease domains, from the perceived effectiveness of treatments to the quality of life, being crucial to guiding clinical decisions. 3,4 For assessing the control of asthma or allergic rhinitis (AR), several PROMs are available, including the Asthma Control Test (ACT), 5 the Asthma Control Questionnaire (ACQ), 6 the Allergic Rhinitis Control Test (ARCT), 7

and the Rhinitis Control Assessment
Test (RCAT). 8,9 All of these PROMs assess asthma and AR separately.
However, most patients with asthma also have AR 10,11 and there is a need to simultaneously evaluate these two conditions. 12 The Allergic Rhinitis and Its Impact on Asthma (ARIA) initiative suggests that both conditions should be holistically evaluated using a single tool. 13,14 To the best of our knowledge, the Control of Allergic Rhinitis and Asthma Test (CARAT) is the only PROM assessing the control of both asthma and AR (other PROMs developed to be used in patients with asthma and AR either focus on quality of life 15 or screening of AR in asthmatic patients 16 ). It has 10 questions addressing upper and lower airway symptoms, sleep disturbances, limitation of activities, and the need to increase medication in the previous 4 weeks. The total score ranges from 0 to 30 points with scores above 24 points indicating good control of both conditions. 17 Control of Allergic Rhinitis and Asthma Test development has been thoroughly documented and has been independently assessed by several studies. [18][19][20] Moreover, CARAT has been widely used in clinical practice and in scientific research, which led to its prompt translation and cross-cultural adaptation based on international recommendations and best practices. [20][21][22] It may be administered on paper during medical visits, but it is also available in digital versions, through a website 23 and mHealth apps for asthma and AR, 24,25 allowing the patient to use it between clinical assessments.
Hence, the growing use of CARAT prompts the need for a systematic assessment of its measurement (psychometric) properties.
Therefore, the purpose of this systematic review was to objectively evaluate the measurement properties of CARAT using the COnsensus-based Standards for the selection of health status Measurement Instruments (COSMIN) methodology for systematic reviews of PROMs guidelines. 26

| Study design
This systematic review with meta-analysis was reported according to the recommendations of the Preferred Reporting in the Systematic Reviews and Meta-Analyses (PRISMA) 27 and the COSMIN methodology for systematic reviews of PROMs guidelines. 26 The COSMIN methodology has specific recommendations on the assessment of the risk of bias in primary studies, on the rating of measurement properties and on the assessment of the overall quality of evidence for each measurement property. 26

| Selection criteria
We included original studies (i) assessing adolescents (aged 12 years and older) or adults with asthma and/or AR, and (ii.a) which concerned the development, assessment of properties (such as validity, reliability, consistency and responsiveness), and/or cultural adaption and validation of CARAT or (ii.b) in which such questionnaire was used simultaneously with other PROMs as a study endpoint, and (iii) which used CARAT to assess asthma and/or AR control with a 4week recall. We excluded reports available solely as conference abstracts, as recommended by COSMIN. 26 No restrictions based on publication date or language were applied.

| Study selection
After eliminating duplicates, two independent authors (RJV and CJ) screened articles' titles and abstracts. The full texts of articles not excluded in the screening phase were independently read by two authors (RJV and ACF). Efforts to contact the investigators were made whenever publications were not accessible by other means.
Articles in a language unknown to the reviewers were translated to English either by native speakers of that language or by using an online translator tool. 28 Any disagreement between the authors was solved by consensus.

| Data extraction
The following data were independently extracted from each included primary study by two authors (RJV and ACF) into a purposely built form: sample size, distribution of participants' age and gender, frequency of patients with AR and/or asthma, setting (e.g., primary care, secondary care…), country and language of questionnaire administration. In addition, we retrieved information on the results obtained by each primary study on the measurement properties of CARAT.
When more than one report assessed the same participants (or overlapped in the assessed participants), information was retrieved from the article assessing a larger sample, and the remaining articles were screened for additional information not presented in the main article.

| Quality assessment
The measurement properties of CARAT were assessed by two independent authors (RJV and ACF) using the COSMIN methodology. 26 The evaluation of such properties comprised (i) the assessment of the methodological quality of primary studies, (ii) the overall rating of the measurement properties of CARAT, and (iii) the assessment of the generated quality of evidence. The rates applied for each domain are stated in Supplementary Table S2 and further explained below.
The methodological quality of primary studies concerns the risk of bias assessment of the included studies (including those concerning the development of CARAT) regarding each psychometric property on items such as the adoption of the most adequate statistical procedures and measures, sampling and study size. It is rated from 'very good (V)' to 'inadequate (I)' using the COSMIN risk of bias checklist, 26  (+) or 'insufficient' (−) rating was given if >75% of results were concurrent, an 'inconsistent' (�) rating was given if no rating exceeded 75% and no appropriate explanation for inconsistency could be given, and an 'indeterminate' (?) rating was given if all single study results were indeterminate. 26 Finally, the quality of evidence concerns the confidence in the summarized results based on the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach. It was rated as high, moderate, low, or very low, taking into account the methodological quality of the studies, the inconsistency of results across studies, imprecision, and indirectness 26,31

| Data analysis
To perform a quantitative synthesis of evidence on the internal consistency, reliability, construct validity and responsiveness of CARAT and its subscales ('CARAT upper airway' and 'CARAT lower airway'), we performed meta-analyses of Cronbach alphas (internal consistency), intraclass correlation coefficients (ICC; reliability) and Spearman correlation coefficients (construct validity and responsiveness). We were not able to perform a meta-analysis of other properties (e.g., measurement error), due to insufficient number of included primary studies assessing such properties.
We applied the random-effects model, using the restricted maximum likelihood method. No primary study presented confidence intervals or standard errors along with their effect size measures.
Therefore, for performing meta-analysis of Spearman correlation coefficients and ICCs, coefficients were firstly transformed according to the formula 0:5 � ln ; meta-analytical results were then back-transformed into the natural scale. For performing metaanalysis of Cronbach alphas, variances were estimated based on computed confidence interval limits. 33 Heterogeneity was assessed using the I 2 statistic and the p-value for the Q-Cochran statistic-an I 2 >50% and a p-value <0.10 were deemed to represent substantial heterogeneity. Whenever information was available, sensitivity analyses were performed for patients with asthma and patients without asthma. In addition, to ensure inclusion of studies with similar methodology, for outcomes assessed by primary studies using different data retrieving strategies (e.g., outpatient consultation with physicians versus patient self-reporting through mHealth tools), our main meta-analytical results were those not considering mHealth data.
All analyses were performed using the metafor package of software R (version 4.0).

| Study selection
Our database search returned a total of 283 search results ( Figure 1).

| Methodological quality of primary studies
There was variation in the methodological quality ratings for each psychometric property in each individual study, but overall we found a low risk of bias for all assessed psychometric properties ( Table 2).
The quality of PROM development for CARAT (Table 3) is rated based on its development study. 34 The ratings for the general design requirements ranged from 'adequate' to 'very good'. Regarding concept elicitation, although data collection methods were deemed 'very good' and a skilled interviewer was used, meetings were recorded but not transcribed verbatim. 34 The same issue deemed the rating of the assessment of comprehensibility in the pilot test as 'doubtful'. Comprehensiveness was not assessed in a pilot test, but only in the development process. Supplementary Table S3 and Supplementary Table S4 display the results for the overall rating and quality of evidence assessment for CARAT. Meta-analytical results are available in detail in Table 4 and   Supplementary Table S5, and summarized in Figure 2. It was possible to assess all measurement properties at least once, except for crosscultural and criterion validity; regarding the latter, we considered the comparisons between the scores in CARAT and other PROMs as evidence for construct validity (and not of criterion validity), as per the COSMIN guidelines. 26

| Content validity
The assessment of content validity was solely based on the development study 34 and the authors' opinions (Supplementary Table S3).     We rated content validity as 'sufficient', albeit with very low quality of evidence, due to the absence of independent individual studies assessing the content validity of CARAT.

| Structural validity
Three studies assessed the structural validity of the CARAT, 20,22,35 which confirmed the two-factorial scale structure of CARAT.

| Internal consistency
The internal consistency of CARAT was assessed in 7 studies (published in 8 reports). 17

| Cross-cultural validity
No multiple group factor analysis or differential item functioning was

| Interpretability and feasibility
Interpretability of CARAT is summarized in Supplementary Table S7.
Overall, the percentage of missing items was low in all studies (between 0% and 9.7%). The percentage of participants reaching the maximum score (ceiling score) ranged between 2.6% and 8.7%.
Feasibility is described in Supplementary Table S8. Control of Allergic Rhinitis and Asthma Test is made of 10 questions which take less than three minutes to complete, its use for individual purposes is free and does not require any prior authorization for clinical use.

| DISCUSSION
This is the first systematic review of measurement properties for asthma and/or AR and following the COSMIN guidelines 26,29 to assess and summarise the psychometric properties of CARAT.
Overall, we found that CARAT shows high internal consistency, reliability, construct validity and responsiveness, despite the heterogeneous studies that were included. These results indicate that CARAT can be successfully used to assess the control of asthma and AR with a 4 week recall.
Control of Allergic Rhinitis and Asthma Test was originally developed in Portuguese. 34,35 Physicians and patients were involved in its development stage. Patient involvement is crucial to ensure that questionnaires include patients' perspectives and are tailored to their needs. 51 In the development of CARAT, 60 individual interviews were performed by a trained psychologist, but interviews were not Content validity refers to whether the content of an instrument appropriately reflects the construct that is being measured and it is often considered the most important measurement property of an instrument. 26,29 Based on the results from the development study and an independent assessment by two reviewers, we deemed the content validity as 'sufficient', as CARAT meets all the topics for relevance and comprehensibility. Comprehensiveness was not assessed in its development study, but CARAT follows the ARIA 14 and the Global Initiative for Asthma (GINA) 53 guidelines and the reviewers independently agreed that CARAT includes all the key concepts for the assessment of asthma and AR control. However, there was insufficient evidence on the content validity of CARAT as it was only explored in the development study, 34 as occurred in systematic reviews of other PROMs. Importantly, the cross-cultural adaptation of CARAT required that a sample of the target population was enquired on the relevance and comprehensibility of the questionnaire, 12 but results were not reported (NR) in primary studies. The lack of independent studies assessing the content validity was also a limitation observed in systematic reviews of other PROMs. 1,52 In fact, it has been previously recognized that content validity has not been rigorously demonstrated for most asthma PROMs. 54 To the best of our knowledge, there are two published systematic reviews assessing the PROMs commonly used in asthma 13,55 and one for AR, 56 which did not evaluate the content validity of the included PROMs. Therefore, our systematic review is the first to systematically assess the content validity of a PROM for asthma and AR.
'Sufficient' structural validity is a prerequisite for assessing in- There is no comparable gold standard assessing asthma and AR control. As a result, we considered the comparisons between CARAT and other validated PROMs to be evidence for construct validity (namely convergent validity), as per the COSMIN guidelines. 26 We found good correlations with low heterogeneity between the CARAT score and other PROMs, namely the VASs and ACQ-5. Likewise, we found a good correlation for the comparison between the CARAT score and the ACT, but with substantial heterogeneity, which can be partly explained by the inclusion of patients with AR without asthma in the primary studies (which were not so present in studies assessing correlation with ACQ-5). Indeed, when performing sensitivity analysis and quantitatively pooling the results only from studies including patients with asthma, we found that the heterogeneity was greatly decreased. Regarding the lower correlation with the EQ-5D VAS, it is important to note that CARAT and EQ-5D measure related but dissimilar constructs, and that EQ-5D may not be the best quality of life measure to be used in asthma 57 as (i) it does not react very sensitively to small changes in asthma control, 58 (ii) VAS EQ-5D is less sensitive than ACQ-6 to assess asthma control, 59 and (iii) it incompletely represents the deficits of quality of life in severe asthma. 60 Therefore, CARAT shows good correlation with asthma PROMs, even in patients without AR. Importantly, we included only studies assessing the original recall period of 4 weeks. One study validated CARAT to be used with a 1 week recall period. 25 Although not included in this systematic review, its results are consistent with those described here. This study has also important strengths. Although previous studies performed systematic reviews of PROMs used in asthma 13 and AR, 56 they did not pool quantitative evidence by meta-analysis and did not follow the COSMIN methodology recommendations.
One additional systematic review performed meta-analysis on the accuracy of the ACT and the ACQ, 55 but it does not follow the COSMIN recommendations and takes a diagnostic performance approach assessing only sensitivity, specificity, likelihood ratios, diagnostic odds ratio and area under the ROC curve. Therefore, our study is the first systematic review to follow the COSMIN guidelines and to qualitatively and quantitatively assess the measurement properities of a PROM used in asthma or AR, thereby setting a stronger level of evidence for asthma and/or AR control questionnaires. The obtained evidence supports the use of CARAT to assess the control of asthma and AR in clinical practice. Another strength of our study is the inclusion in the literature search of cross-referencing using Google Scholar, in order not to miss any relevant publications meeting the inclusion criteria of our review. Additionally, two independent authors were involved in all steps of this review, which was especially relevant for assessing the content validity of CARAT. We were able to perform meta-analyses on several properties of the questionnaire, thus better summarizing the evidence on the measurement properties of CARAT. Importantly, especially for convergent validity, heterogeneity was, overall, low.
In conclusion, this systematic review with meta-analysis summarises for the first time, both qualitatively and quantitatively, the measurement properties of a control questionnaire for asthma or AR.
We observed moderate quality evidence for construct validity and responsiveness of CARAT. We also report high internal consistency and reliability, although this is based on lower quality of evidence, mostly reflecting heterogeneity in the underlying primary studies.
These results indicate that CARAT can be successfully used to assess the control of asthma and AR with a 4-week recall. Still, more research is needed on the use of CARAT in patients diagnosed solely with asthma or AR. We also identified the need for synthesis research on the measurement properties of other PROMs available for asthma and AR.

AUTHOR CONTRIBUTIONS
Rafael José Vieira participated in data extraction, evidence analysis and manuscript writing; Bernardo Sousa-Pinto participated in